LLVM verification #356

AFOliveira · 2024-12-13T13:48:23Z

Following up on #258.

I've been doing work on what @lenary proposed as a first approach, I think this is an ok first mock-up. Still a WIP since many instructions still have bugs, but if you have any comments or recommendations, please LMK :).

I did not add LLVM as a submodule yet because it may be easier licensing wise to just point at it in the script? Usage is python3 riscv_parser.py <tablegen_json_file> <arch_inst_directory>"). I've also used jq to enhance readibility on the output of llvm-tblgen -I llvm/include -I llvm/lib/Target/RISCV llvm/lib/Target/RISCV/RISCV.td --dump-json -o <path-to-json-output>.

lenary

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Is there anything specific you want feedback on?

Which instructions are you running into issues with so far?

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

lenary · 2024-12-13T14:15:50Z

ext/auto-inst/parsing.py

+        output_stream.write("-" * 20 + "\n")
+        output_stream.write(f"Name:              {name}\n")
+        output_stream.write(f"Assembly Format:   {safe_get(data, 'AsmString', 'N/A')}\n")
+        output_stream.write(f"Size:              {safe_get(data, 'Size', 'N/A')} bytes\n")


If an instruction doesn't have a Size, it cannot be encoded, so it's likely not of interest to checking the encoding.

lenary · 2024-12-13T14:21:53Z

ext/auto-inst/parsing.py

+        output_stream.write(f"Commutable:        {'Yes' if safe_get(data, 'isCommutable', 0) else 'No'}\n")
+        output_stream.write(f"Memory Load:       {'Yes' if safe_get(data, 'mayLoad', 0) else 'No'}\n")
+        output_stream.write(f"Memory Store:      {'Yes' if safe_get(data, 'mayStore', 0) else 'No'}\n")
+        output_stream.write(f"Side Effects:      {'Yes' if safe_get(data, 'hasSideEffects', 0) else 'No'}\n")


At some point we probably need a good discussion about these and how we verify them.

Broadly:

isCommutable will only be set if we can teach LLVM how to commute the instruction - using a hook implemented in C++. We do this quite a lot for e.g. inverting conditions. This is not something the assembler will do, it's done earlier, in code generation though.

mayLoad and mayStore should be accurate enough, but I guess you'll need to statically analyse the pseudocode to work out if a load or store happens. We really only model loads/stores to conventional memory (and not, e.g., loads for page table walks or permission checks). I'm not sure there's any modelling of ordering. These are used to prevent code motion during code generation.

hasSideEffects is a catch all for "be very careful with this instruction during codegen", and usually points to other effects that LLVM doesn't model. Generic CSR accesses are part of this (except the floating point CSR, which I think we model correctly), but so are other effects I haven't thought hard about.

Do you think those should be explained in the UDB as well?

I think this probably needs a longer discussion on how denormalised this information should be. i.e., the information is there if you statically-analyse the pseudocode (which should absolutely be something we should be able to do with the pseudocode), but we probably don't want that to be the only way to find out this sort of thing as the pseudocode operations might be a large amount of code.

I don't think you should be exactly matching LLVM's internal representation, but I do think there is the opportunity to denormalise more information that might be generally useful for this kind of tool.

I agree that this needs a more appropriate discussion, l think @dhower-qc has been working on instruction representations lately, so he probably also has been thinking about this.

I don't have time for this discussion before mid-January.

Let's try get the things that are already in the YAML done first, i.e.:

Instruction Encodings

Extensions and Profiles

ext/auto-inst/parsing.py

lenary · 2024-12-13T14:24:43Z

ext/auto-inst/parsing.py

+    """
+    Attempt to find a matching key in json_data for instr_name, considering different
+    naming conventions: replacing '.' with '_', and trying various case transformations.
+    """


I really suggest using the name from AsmString rather than this.

ok, I'll change that, thanks for the suggestion!

AFOliveira · 2024-12-13T15:40:16Z

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Thanks for your feedback!

Is there anything specific you want feedback on?

Not yet, the point was just some general considerations about the initial approach.

Which instructions are you running into issues with so far?

I still didnt find a pattern, but I'll try to fix this soon and if I run into any issue I'm not able to solve by myself, I'll try to bring it up, thanks!

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

I'll take a look into pytest and see how to port this, thanks for the suggestion!

AFOliveira · 2024-12-19T09:29:54Z

current known bugs are:

Can't parse yaml complex imm encodings (i.e. 31|7|30-25|11-8)
Some instructions need to be fixed in the UDB, since this already caught some errors.
Need to split pytest into different unit tests rather than one big test.
Use ASMString from JSON.

I'm working on both now.

lenary · 2024-12-19T11:00:25Z

Somewhere, a __pycache__ and *.pyc is missing from a gitignore. :)

lenary · 2024-12-19T23:26:47Z

ext/auto-inst/parsing.py

+    instr_name = instr_name.lower().strip()
+
+    # Search through all entries in json_data
+    for key, value in json_data.items():


You can use the !instanceof metadata to only search through things with an encoding, I pointed this out in the comment about the initial approach. This will save you iterating through aliases (and other data like isel patterns)

From what I can see !instanceof follows this structure, but I'm finding it hard to understand how to use it for the parsing purposes, can you please further explain?

{ "!instanceof": { "AES_1Arg_Intrinsic": [ "int_arm_neon_aesimc", "int_arm_neon_aesmc" ], "AES_2Arg_Intrinsic": [ "int_arm_neon_aesd", "int_arm_neon_aese" ], "ALUW_rr": [ "ADDW", "ADD_UW", "DIVUW", "DIVW", "MULW", "PACKW", "REMUW", "REMW", "ROLW", "RORW", "SH1ADD_UW", "SH2ADD_UW", "SH3ADD_UW", "SLLW", "SRAW", "SRLW", "SUBW" ], "ALU_ri": [ "ADDI", "ANDI", "ORI", "SLTI", "SLTIU", "XORI" ],

So, let's assume you parsed the whole json object, into a python variable called json.

json["!instanceof"] is a map, from tablegen classes, to a list of definition names that are instances of that class (or its sub-classes). These definition names appear in the top-level json.

You would use code like the following:

for def_name in json["!instanceof"]["RVInstCommon"]: def_data = json[def_name] ...

This saves you having to look at all the tablegen data that is not an instruction (so an alias or a pattern or CSR or something).

Note you'll still have to look at isPseudo and isCodeGenOnly and potentially exclude items where one or both of those is true.

AFOliveira · 2024-12-22T14:58:18Z

Unit testing is now working.

TODO:
Solve complex immediates
Add the optimization @lenary proposed on code structure
Add this to the validation process

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira · 2024-12-23T14:59:50Z

Can I insert LLVM as a submodule or would this be another licensing problem?

AFOliveira · 2024-12-23T15:52:31Z

This is functional. Things missing is how to address LLVM and adding it the test to Rakefile. I will also see how I can optimize code, e.g. what @lenary proposed on one review.

lenary · 2024-12-23T18:47:14Z

Can I insert LLVM as a submodule or would this be another licensing problem?

I highly suggest using environment variables to point to LLVM, and skipping these tests if those environment variables are not set.

AFOliveira · 2024-12-23T21:46:50Z

I had LLVM outside of singularity repo thus I was not able to do it, but if I clone inside and run it correctly I think that works as well, I'll set it up!

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira · 2024-12-27T12:57:18Z

I believe every comment is now solved, including optimizations. I'm skipping atomic operations due to what I reported on #361, but that can be patched in the future for better completion.

AFOliveira requested a review from lenary December 13, 2024 13:48

AFOliveira force-pushed the AFOliveira/LLVM branch from a56b0b1 to d9b50b2 Compare December 13, 2024 14:13

lenary reviewed Dec 13, 2024

View reviewed changes

AFOliveira mentioned this pull request Dec 18, 2024

Instruction description mismatch #360

Open

lenary reviewed Dec 19, 2024

View reviewed changes

This was referenced Dec 23, 2024

Acquire and release instructions have a bug #361

Open

Fix CSRRS #376

Open

AFOliveira and others added 14 commits December 23, 2024 12:59

Add simple Docker environment variable

7f82b46

Signed-off-by: Afonso Oliveira <[email protected]>

Fix errors due to incorrect parsing of VM

7141a9c

Signed-off-by: Afonso Oliveira <[email protected]>

First Refactor to pytest

6e45c3b

Signed-off-by: Afonso Oliveira <[email protected]>

Allow 16 bit instructions for C extension

ac04c28

Signed-off-by: Afonso Oliveira <[email protected]>

Revert bad parsing

f1b8613

Signed-off-by: Afonso Oliveira <[email protected]>

Allow only one value

e3e7456

Signed-off-by: Afonso Oliveira <[email protected]>

Use AsmString instead of name

4355eb0

Signed-off-by: Afonso Oliveira <[email protected]>

Small Refactor on parsing.py

8492581

refactor to do unit tests

3bd1d2c

refactor to file name

6b9fdda

Modify to have seveal Unit tests instead of just one

831a10b

Signed-off-by: Afonso Oliveira <[email protected]>

Clean up and code reorganization

655e1d6

Signed-off-by: Afonso Oliveira <[email protected]>

Ensure it is not pseudo

d61fb3b

Signed-off-by: Afonso Oliveira <[email protected]>

Skip aq/rl instructions

5992124

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira force-pushed the AFOliveira/LLVM branch from 956ad97 to 5992124 Compare December 23, 2024 12:59

add pytest to requirements

c2e6e92

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira added 4 commits December 27, 2024 10:56

add LLVM path as environment variable

7c9c65b

Signed-off-by: Afonso Oliveira <[email protected]>

remove and ignor python cache

3178538

Signed-off-by: Afonso Oliveira <[email protected]>

Add LLVM test to the Rakefile

11d4c60

Signed-off-by: Afonso Oliveira <[email protected]>

Optimizations to test logic and modify test order

2185239

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira marked this pull request as ready for review December 27, 2024 12:40

AFOliveira requested a review from dhower-qc as a code owner December 27, 2024 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM verification #356

LLVM verification #356

AFOliveira commented Dec 13, 2024

lenary left a comment

lenary Dec 13, 2024

lenary Dec 13, 2024

AFOliveira Dec 13, 2024

lenary Dec 13, 2024

AFOliveira Dec 14, 2024 •

edited

Loading

lenary Dec 15, 2024

lenary Dec 13, 2024

AFOliveira Dec 13, 2024

AFOliveira commented Dec 13, 2024

AFOliveira commented Dec 19, 2024 •

edited

Loading

lenary commented Dec 19, 2024

lenary Dec 19, 2024

AFOliveira Dec 20, 2024

lenary Dec 20, 2024

AFOliveira commented Dec 22, 2024

AFOliveira commented Dec 23, 2024

AFOliveira commented Dec 23, 2024 •

edited

Loading

lenary commented Dec 23, 2024

AFOliveira commented Dec 23, 2024

AFOliveira commented Dec 27, 2024 •

edited

Loading

LLVM verification #356

Are you sure you want to change the base?

LLVM verification #356

Conversation

AFOliveira commented Dec 13, 2024

lenary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AFOliveira Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AFOliveira commented Dec 13, 2024

AFOliveira commented Dec 19, 2024 • edited Loading

lenary commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AFOliveira commented Dec 22, 2024

AFOliveira commented Dec 23, 2024

AFOliveira commented Dec 23, 2024 • edited Loading

lenary commented Dec 23, 2024

AFOliveira commented Dec 23, 2024

AFOliveira commented Dec 27, 2024 • edited Loading

AFOliveira Dec 14, 2024 •

edited

Loading

AFOliveira commented Dec 19, 2024 •

edited

Loading

AFOliveira commented Dec 23, 2024 •

edited

Loading

AFOliveira commented Dec 27, 2024 •

edited

Loading